NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
Rows: 50147 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): game_id, posteam
dbl (8): play_id, drive, week, qtr, down, half_seconds_remaining, pass, wp
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
rmarkdown::paged_table(NFL2022_stuffs)
skim(NFL2022_stuffs) %>%select(-n_missing)
Data summary
Name
NFL2022_stuffs
Number of rows
50147
Number of columns
10
_______________________
Column type frequency:
character
2
numeric
8
________________________
Group variables
None
Variable type: character
skim_variable
complete_rate
min
max
empty
n_unique
whitespace
game_id
1.00
13
15
0
284
0
posteam
0.93
2
3
0
32
0
Variable type: numeric
skim_variable
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
play_id
1.00
2057.86
1194.22
1
1039.00
2034.00
3065.50
5523
▇▇▇▅▁
drive
0.99
11.48
6.59
1
6.00
11.00
17.00
35
▇▇▇▂▁
week
1.00
9.91
5.61
1
5.00
10.00
15.00
22
▇▆▆▆▃
qtr
1.00
2.58
1.14
1
2.00
3.00
4.00
5
▆▇▆▇▁
down
0.83
2.00
1.00
1
1.00
2.00
3.00
4
▇▆▁▃▂
half_seconds_remaining
1.00
796.94
564.41
0
255.00
774.00
1285.00
1800
▇▅▅▅▅
pass
1.00
0.45
0.50
0
0.00
0.00
1.00
1
▇▁▁▁▆
wp
0.99
0.51
0.29
0
0.29
0.52
0.73
1
▆▆▇▆▆
In data.frame, NFL2022_stuffs, remove observations for which values of posteam is missing
q2b <- Summarize the mean value of pass for each posteam when all of the following conditions hold: 1. wp is greater than 20% and less than 75% 2. down is less than or equal to 2 3. half_seconds_remaining is greater than 120
posteam mean_pass
1 ARI 0.5528455
2 ATL 0.4000000
3 BAL 0.5198330
4 BUF 0.6043956
5 CAR 0.4578947
6 CHI 0.4198312
7 CIN 0.6567460
8 CLE 0.4908722
9 DAL 0.4742647
10 DEN 0.4930796
11 DET 0.4906542
12 GB 0.5088496
13 HOU 0.4793388
14 IND 0.4938525
15 JAX 0.5207921
16 KC 0.6376068
17 LA 0.5104895
18 LAC 0.6076190
19 LV 0.4921569
20 MIA 0.5334646
21 MIN 0.5555556
22 NE 0.5208333
23 NO 0.4214464
24 NYG 0.5153846
25 NYJ 0.5061728
26 PHI 0.5801217
27 PIT 0.4796296
28 SEA 0.5662188
29 SF 0.4805726
30 TB 0.5529412
31 TEN 0.4342723
32 WAS 0.4054581
#q2c Provide both (1) a ggplot code with geom_point() using the resulting data.frame in Q2b and (2) a simple comments to describe the mean value of pass for each posteam. In the ggplot, reorder the posteam categories based on the mean value of pass in ascending or in descending order
library(ggplot2)mean_pass_by_posteam$posteam <-factor(mean_pass_by_posteam$posteam, levels = mean_pass_by_posteam$posteam[order(mean_pass_by_posteam$mean_pass)])ggplot(mean_pass_by_posteam, aes(x = mean_pass, y = posteam)) +geom_point() +labs(x ="Percent of Pass Plays", y ="Team with Possession", title ="Team vs Percent of Pass Plays") +theme(axis.text.x =element_text(angle =45, hjust =1))
Rows: 46427 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): game_id, posteam, receiver, passer
dbl (3): play_id, drive, epa
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
rmarkdown::paged_table(NFL2022_epa)
Create the data.frame, NFL2022_stuffs_EPA, that includes:
All the variables in the data.frame, NFL2022_stuffs
The variables, passer, receiver, and epa, from the data.frame, NFL2022_epa by joining the two data.frames
In the resulting data.frame, NFL2022_stuffs_EPA, remove observations with NA in passer
Rows: 46427 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): game_id, posteam, receiver, passer
dbl (3): play_id, drive, epa
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#q2e Provide both (1) a single ggplot and (2) a simple comment to describe the NFL weekly trend of weekly mean value of epa for each of the following two passers: 1. "J.Allen" 2. "P.Mahomes"
library(ggplot2)selected_passers <-c("J.Allen", "P.Mahomes")filtered_data <- NFL2022_stuffs_EPA[NFL2022_stuffs_EPA$passer %in% selected_passers, ]filtered_data$week <-factor(filtered_data$week, levels =unique(filtered_data$week))weekly_mean_epa <-aggregate(epa ~ week + passer, data = filtered_data, FUN = mean)ggplot(weekly_mean_epa, aes(x = week, y = epa, color = passer, group = passer)) +geom_line() +labs(x ="Week", y ="Weekly Mean EPA", title ="NFL Weekly Trend of Mean EPA for J.Allen and P.Mahomes") +theme_minimal()
Patrick Mahomes generally has a higher weekly mean epa than Josh Allen
q2f
Calculate the difference between the mean value of epa for "J.Allen" the mean value of epa for "P.Mahomes" for each value of week.
selected_passers <-c("J.Allen", "P.Mahomes")filtered_data <- NFL2022_stuffs_EPA[NFL2022_stuffs_EPA$passer %in% selected_passers, ]mean_epa_by_week <-aggregate(epa ~ week + passer, data = filtered_data, FUN = mean)epa_diff <-reshape(mean_epa_by_week, idvar ="week", timevar ="passer", direction ="wide")epa_diff$epa_diff <- epa_diff$epa.J.Allen - epa_diff$epa.P.Mahomesprint(epa_diff)
week epa.J.Allen epa.P.Mahomes epa_diff
1 1 0.52963415 0.69840404 -0.16876989
2 2 0.48691617 0.14841216 0.33850401
3 3 0.16932725 0.24559401 -0.07626677
4 4 0.19104682 0.27137549 -0.08032867
5 5 0.62742248 0.30228470 0.32513777
6 6 0.30652151 0.13313721 0.17338430
7 8 0.22419910 NA NA
8 9 -0.20799939 0.09646711 -0.30446651
9 10 0.16051785 0.58904325 -0.42852541
10 11 0.19206366 0.36503570 -0.17297205
11 12 0.09828258 0.24726968 -0.14898710
12 13 0.33021344 0.20622354 0.12398990
13 14 -0.06207961 0.13106472 -0.19314433
14 15 0.25693067 0.32195856 -0.06502788
15 16 0.02143551 0.12156763 -0.10013212
16 18 0.20865931 0.17297609 0.03568322
17 19 -0.20950326 NA NA
18 20 -0.04289048 0.27933023 -0.32222071
25 7 NA 0.70130690 NA
34 17 NA 0.19847047 NA
37 21 NA 0.19610416 NA
38 22 NA 0.55937371 NA
q2g
Summarize the resulting data.frame in Q2d, with the following four variables:
posteam: String abbreviation for the team with possession.
passer: Name of the player who passed a ball to a receiver by initially taking a three-step drop, and backpedaling into the pocket to make a pass. (Mostly, they are quarterbacks.)
mean_epa: Mean value of epa in 2022 for each passer
n_pass: Number of observations for each passer
Then find the top 10 NFL passers in 2022 in terms of the mean value of epa, conditioning that n_pass must be greater than or equal to the third quantile level of n_pass